{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ex - GroupBy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Introduction:\n",
    "\n",
    "GroupBy can be summarized as Split-Apply-Combine.\n",
    "\n",
    "Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n",
    "\n",
    "Check out this [Diagram](http://i.imgur.com/yjNkiwL.png)  \n",
    "### Step 1. Import the necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3. Assign it to a variable called drinks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>AS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>EU</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>AF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>EU</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>AF</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0        AS  \n",
       "1                           4.9        EU  \n",
       "2                           0.7        AF  \n",
       "3                          12.4        EU  \n",
       "4                           5.9        AF  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv')\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4. Which continent drinks more beer on average?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "continent\n",
       "AF     61.471698\n",
       "AS     37.045455\n",
       "EU    193.777778\n",
       "OC     89.687500\n",
       "SA    175.083333\n",
       "Name: beer_servings, dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').beer_servings.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5. For each continent print the statistics for wine consumption."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "continent       \n",
       "AF         count     53.000000\n",
       "           mean      16.264151\n",
       "           std       38.846419\n",
       "           min        0.000000\n",
       "           25%        1.000000\n",
       "           50%        2.000000\n",
       "           75%       13.000000\n",
       "           max      233.000000\n",
       "AS         count     44.000000\n",
       "           mean       9.068182\n",
       "           std       21.667034\n",
       "           min        0.000000\n",
       "           25%        0.000000\n",
       "           50%        1.000000\n",
       "           75%        8.000000\n",
       "           max      123.000000\n",
       "EU         count     45.000000\n",
       "           mean     142.222222\n",
       "           std       97.421738\n",
       "           min        0.000000\n",
       "           25%       59.000000\n",
       "           50%      128.000000\n",
       "           75%      195.000000\n",
       "           max      370.000000\n",
       "OC         count     16.000000\n",
       "           mean      35.625000\n",
       "           std       64.555790\n",
       "           min        0.000000\n",
       "           25%        1.000000\n",
       "           50%        8.500000\n",
       "           75%       23.250000\n",
       "           max      212.000000\n",
       "SA         count     12.000000\n",
       "           mean      62.416667\n",
       "           std       88.620189\n",
       "           min        1.000000\n",
       "           25%        3.000000\n",
       "           50%       12.000000\n",
       "           75%       98.500000\n",
       "           max      221.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').wine_servings.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 6. Print the mean alcohol consumption per continent for every column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AF</th>\n",
       "      <td>61.471698</td>\n",
       "      <td>16.339623</td>\n",
       "      <td>16.264151</td>\n",
       "      <td>3.007547</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>37.045455</td>\n",
       "      <td>60.840909</td>\n",
       "      <td>9.068182</td>\n",
       "      <td>2.170455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EU</th>\n",
       "      <td>193.777778</td>\n",
       "      <td>132.555556</td>\n",
       "      <td>142.222222</td>\n",
       "      <td>8.617778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OC</th>\n",
       "      <td>89.687500</td>\n",
       "      <td>58.437500</td>\n",
       "      <td>35.625000</td>\n",
       "      <td>3.381250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SA</th>\n",
       "      <td>175.083333</td>\n",
       "      <td>114.750000</td>\n",
       "      <td>62.416667</td>\n",
       "      <td>6.308333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           beer_servings  spirit_servings  wine_servings  \\\n",
       "continent                                                  \n",
       "AF             61.471698        16.339623      16.264151   \n",
       "AS             37.045455        60.840909       9.068182   \n",
       "EU            193.777778       132.555556     142.222222   \n",
       "OC             89.687500        58.437500      35.625000   \n",
       "SA            175.083333       114.750000      62.416667   \n",
       "\n",
       "           total_litres_of_pure_alcohol  \n",
       "continent                                \n",
       "AF                             3.007547  \n",
       "AS                             2.170455  \n",
       "EU                             8.617778  \n",
       "OC                             3.381250  \n",
       "SA                             6.308333  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 7. Print the median alcohol consumption per continent for every column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AF</th>\n",
       "      <td>32.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>17.5</td>\n",
       "      <td>16.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EU</th>\n",
       "      <td>219.0</td>\n",
       "      <td>122.0</td>\n",
       "      <td>128.0</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OC</th>\n",
       "      <td>52.5</td>\n",
       "      <td>37.0</td>\n",
       "      <td>8.5</td>\n",
       "      <td>1.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SA</th>\n",
       "      <td>162.5</td>\n",
       "      <td>108.5</td>\n",
       "      <td>12.0</td>\n",
       "      <td>6.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           beer_servings  spirit_servings  wine_servings  \\\n",
       "continent                                                  \n",
       "AF                  32.0              3.0            2.0   \n",
       "AS                  17.5             16.0            1.0   \n",
       "EU                 219.0            122.0          128.0   \n",
       "OC                  52.5             37.0            8.5   \n",
       "SA                 162.5            108.5           12.0   \n",
       "\n",
       "           total_litres_of_pure_alcohol  \n",
       "continent                                \n",
       "AF                                 2.30  \n",
       "AS                                 1.20  \n",
       "EU                                10.00  \n",
       "OC                                 1.75  \n",
       "SA                                 6.85  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').median()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 8. Print the mean, min and max values for spirit consumption.\n",
    "#### This time output a DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AF</th>\n",
       "      <td>16.339623</td>\n",
       "      <td>0</td>\n",
       "      <td>152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>60.840909</td>\n",
       "      <td>0</td>\n",
       "      <td>326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EU</th>\n",
       "      <td>132.555556</td>\n",
       "      <td>0</td>\n",
       "      <td>373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OC</th>\n",
       "      <td>58.437500</td>\n",
       "      <td>0</td>\n",
       "      <td>254</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SA</th>\n",
       "      <td>114.750000</td>\n",
       "      <td>25</td>\n",
       "      <td>302</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 mean  min  max\n",
       "continent                      \n",
       "AF          16.339623    0  152\n",
       "AS          60.840909    0  326\n",
       "EU         132.555556    0  373\n",
       "OC          58.437500    0  254\n",
       "SA         114.750000   25  302"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
